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ABSTRACT 

Over the last decade, kernel methods for nonlinear processing have 
successfully been used in the machine learning community. How- 
ever, so far, the emphasis has been on batch techniques. It is only 
recently, that online adaptive techniques have been considered in 
the context of signal processing tasks. To the best of our knowl- 
edge, no kernel-based strategy has been developed, so far, that is 
able to deal with complex valued signals. In this paper, we take 
advantage of a technique called complexification of real RKHSs 
to attack this problem. In order to derive gradients and subgra- 
dients of operators that need to be defined on the associated com- 
plex RKHSs, we employ the powerful tool of Wirtinger's Calculus, 
which has recently attracted much attention in the signal process- 
ing community. Writinger's calculus simplifies computations and 
offers an elegant tool for treating complex signals. To this end, 
in this paper, the notion of Writinger's calculus is extended, for 
the first time, to include complex RKHSs and use it to derive the 
Complex Kernel Least-Mean-Square (CKLMS) algorithm. Exper- 
iments verify that the CKLMS can be used to derive nonlinear sta- 
ble algorithms, which offer significant performance improvements 
over the traditional complex LMS or Widely Linear complex LMS 
(WL-LMS) algorithms, when dealing with nonlinearities. 

1. INTRODUCTION 

Processing in Reproducing Kernel Hilbert Spaces (RKHSs) in the 
context of online adaptive processing is gaining in popularity within 
the Signal Processing community 11234,561. The main ad- 
vantage of mobilizing the tool of RKHSs is that the original nonlin- 
ear task is "transformed" into a linear one, where one can employ 
an easier "algebra". Moreover, different types of nonlinearities 
can be treated in a unifying way, that does not affect the derivation 
of the algorithms, except at the final implementation stage. The 
main concepts of this procedure can be summarized in the follow- 
ing two steps: 1) Map the finite dimensionality input data from 
the input space F (usually F C R") into a higher dimensional- 
ity (possibly infinite) RKHS H and 2) Perform a linear processing 
(e.g., adaptive filtering) on the mapped data in H. The procedure 
is equivalent with a non-linear processing (non-linear filtering) in 
F. 

An alternative way of describing this process is through the 
popular kernel trick (7J[8]: Given an algorithm, which is formu- 
lated in terms of dot products, one can construct an alternative 
algorithm by replacing each one of the dot products with a pos- 
itive definite kernel n. The specific choice of kernel implicitly 



defines an RKHS with an appropriate inner product. Furthermore, 
the choice of kernel also defines the type of nonlinearity that un- 
derlies the model to be used. The main representatives of this class 
of algorithms are the celebrated support vector machines (SVMs), 
which have dominated the research in machine learning over the 
last decade. Besides SVMs and the more recent applications in 
adaptive filtering, there is a plethora of other scientific domains 
that have gained from adopting kernel methods (e.g., image pro- 
cessing and denoising (9] |10| , principal component analysis 1111 . 
clustering [12|, e.t.c). 



In this paper, we focus on the recently developed Kernel Least 
Mean Squares Algorithm (KLMS), which is the LMS algorithm in 
RKHSs (HE). KLMS, as all the known kernel methods that use 
real-valued kernels, is able to deal with real valued data sequences 
only. To our knowledge, no kernel-based strategy has been devel- 
oped, so far, that is able to effectively deal with complex valued 
signals. The main contributions of this paper are: a) the devel- 
opment of a wide framework that allows real-valued kernel al- 
gorithms to be extended to treat complex data effectively, taking 
advantage of a technique called complexification of real RKHSs, 
b) the extension of Wirtinger's Calculus in complex RKHSs as a 
means for the elegant and efficient computation of the gradients, 
that are involved in many adaptive filtering algorithms, and c) the 
development of the Complex Kernel LMS (CKLMS) algorithm, 
by exploiting the developed Wirtinger's calculus. Wirtinger's cal- 
culus 1 14 1 is enjoying increasing popularity, recently, mainly in the 
context of Widely Linear complex adaptive filters I15II16I [1711181 
19. 20, 21 22 1, providing a tool for the derivation of gradients in 
the complex domain. 



The paper is organized as follows. We start with a minimal 
introduction to RKHSs in Section[2] before we briefly review the 
KLMS algorithm in Section[3] In Section[4] we describe the com- 
plexification procedure of a real RKHS that provides the main 
framework for complex kernel methods, based on the popular real 
valued reproducing kernels (e.g., gaussian, polynomial, e.t.c). The 
main notions of the extended Wirtinger's Calculus are summarized 
in Section[5]and the CKLMS is developed thereafter in Section|6] 
Finally, experimental results and conclusions are provided in Sec- 
tions UJ and [8] respectively. We will denote the set of all integers, 
real and complex numbers by N, R and C respectively. Vector or 
matrix valued quantities appear in boldfaced symbols. 



2. REPRODUCING KERNEL HILBERT SPACES 

We start with some basic definitions regarding RKHSs. Let X be 
a non empty set with xi, . . . , xn £ X. Consider a Hilbert space 
H of real valued functions, /, defined on a set X, with a corre- 
sponding inner product (•,-}«. We will call H as a Reproducing 
Kernel Hilbert Space - RKHS, if there exists a function, known as 
kernel, c:lxl->l with the following two properties: 

1. For every x € X, k(x, •) belongs to H. 

2. n has the so called reproducing property, i.e. /(as) = 
(/, k(x, •))«, for all / G H. In particular: 

n{x,y) = (k(x, •),«(«,•))«• 

In can been shown that the kernel « generates the entire space 
H, i.e. "H = span{fi;(a;, -)\x £ X}. There are several kernels that 
are used in practice (see |7|). Among the most widely used are the 
the polynomial kernel: n(x,y) = (l + x T y) , d £ N and the 
gaussian kernel: K(x,y) = exp (— ||x — j/|| 2 /cr 2 ) , <r > 0. 

Although there exist complex reproducing kernels that give 
rise to RKHSs of complex valued functions 1 23], in this paper 
we focus our attention on complexifying real valued ones, which 
have been extensively studied and contain several popular exam- 
ples. Later on (in section[4]l, we will show how one can construct 
complex RKHSs from real ones, through a technique called com- 
plexification. 

3. KERNEL LMS 



In a typical LMS filter the goal is to learn a linear input output map- 
ping / : X — > R : /(as) = w T x, X C R", based on a sequence 
of examples (sc(l), d(l)), (as(2), d(2)), . . . , (x(N), d(N)), so that 
to minimize the mean square error, E \\d(n) — w T x(n)\ 2 ] . To 
this end, the gradient descent rationale is employed and at each 
time instant, n = 1, 2, . . . , N, the gradient of E[e(n)x(n)] is esti- 
mated via its current measurement, i.e., E[e(n)x(n)] — e(n)x(n), 
where e(n) = d(n) — w{n — l) T x(n) is the error at instance 
n — 2, . . . , N. It takes a few lines of elementary algebra to de- 
duce that the update of the unknown vector parameter is: w(n) = 
w(n — 1) + /j,e(n)x(n), where /i is the step update. If we take 
the initial value of w as w(0) — 0, then the repeated application 
of the update equation yields: 



w(n) = fi 2, c(k)x(k) 



(1) 



Hence, for the filter output at instance n we have: 



d(n) — w(n — 1) x(n) = fiy^ e(k)x(k) x(n), (2) 

fe=i 

for n = 1, 2, . . . , TV. Equation (O is expressed in terms of in- 
ner products only, hence it allows for the application of the kernel 
trick. Thus, the filter output of the KLMS at instance n is 

n— 1 

d(n) = (x(n),tv(n — 1)) = /x \^ e(k)K (x(n), x(k)) , (3) 



for n = 1,2, ...,N. 

Another, more formal way of developing the KLMS is the fol- 
lowing. First, we transform the input space X to a high dimen- 
sional feature space W through the (implicit) mapping $ : X — > 
H, <&(x) = k(x, •). Thus, the training examples become 

(HxW),d(l)),...,(H*(N)),d(N)). 

We apply the LMS procedure on the transformed data, with the lin- 
ear filter output d(n) = {<&(x(n)),w). The model (<E>(a;),w;} is 
more representive than the simple w T x, since it includes the non- 
linear modeling through the presence of the kernel. The objective 
now becomes to minimize the cost function 

E[|d(n) -(*(*(»)), ti>)| a ]. 

Using the notion of the Frechet derivative, which has to be mobi- 
lized, since the dimensionality of the RKHS may be infinite, we 
are able to derive the gradient of the aforementioned cost function 
with respect to w. It has to be emphasized, that now w is not 
a vector, but a function, i.e., a point in the linear Hilbert space. 
It turns out that the update of the KLMS is given by w(n) — 
w(n — 1) + /ie(n)$(cc(n)), where e(n) = d(n) — d(n). From 
this update, following the same procedure as in LMS and applying 
the reproducing property, we obtain equations ([3} and (01, which 
are at the core of the KLMS algorithm. More details and the algo- 
rithmic implementation may be found in 1131 - 

Note that, in a number of attempts to kernelize known algo- 
rithms, that are cast in inner products, the kernel trick is, usually, 
used in a "black box" rationale, without consideration of the prob- 
lem in the RKH space, in which the (implicit) processing is carried 
out. Such an approach, often, does not allow for a deeper under- 
standing of the problem, especially if a further theoretical analysis 
is required. Moreover, in our case, such a "blind" application of 
the kernel trick on a standard complex LMS form, can only lead to 
spaces defined by complex kernels. Complex RKH spaces, built 
around complexification of real kernels, do not result as a direct 
application of the standard kernel trick. 

4. COMPLEXIFICATION OF A REAL RKHS 

To generalize the kernel adaptive filtering algorithms on complex 
domains, we need a generalized framework regarding complex 
RKHSs. In this paper, we employ a simple technique called com- 
plexification of real RKHSs, which has the advantage of allowing 
modeling in complex RKHSs using popular well-established real 
kernels (e.g., gaussian, polynomial, e.t.c). 

Let X C R". Define X 2 = X x X C R 2 " and X = {a; + 
iy, x,y G X} equipped with a complex product structure. Let H 
be a real RKHS associated with a real kernel k defined on X 2 x X 2 
and let {-,-)n be its corresponding inner product. Then, every 
/ £ ~H can be regarded as a function defined on either X 2 or X, 
i.e., f(z) = f(x + iy) = /(as, y). 

Next, we define H 2 = 7i x H. It is easy to verify that "H 2 is 
also a Hilbert Space with inner product 



{f,9)n 2 = ih,9i)n + {h,92)n, 



(5) 



while 



w ( n ) =M^e(A;)«(a5(ft)> , )> 



(4) 



for / = (/i,/2) T , g — (<?i,<?2) T . Our objective is to enrich 
H 2 with a complex structure. We address this problem using the 
complexification of the real RKHS H. To this end, we define the 



space EI = {/ = /i + i/2; /i,/a £ %} equipped with the 
complex inner product: 

(f,g)n ={h,gi)n + (h,g2)u+ 
i({h,9i)n -(/i,52>m), 

for f ~ fi + ij "2, p = 51 + J52. It is not difficult to verify 
that H is a complex RKHS with kernel « [23]. We call H the 
complexification of H. 

To complete the presentation of the required framework for 
working on complex RKHSs, we need a technique to map the sam- 
ples data from the complex input space to the complexified RKHS 
H. This problem will be addressed in section|6] 

5. WIRTINGER'S CALCULUS IN COMPLEX RKHS 

Wirtinger's calculus [14| has become very popular in the signal 
processing community mainly in the context of complex adaptive 
filtering, as a means of computing, in an elegant way, gradients 
of real valued cost functions defined on complex domains (C). 
Such functions, obviously, are not holomorphic and therefore the 
complex derivative cannot be used. Instead, if we consider that 
the cost function is defined on a Euclidean domain with a double 
dimensionality (R 2 "), then the real derivatives may be employed. 
The price of this approach is that the computations become cum- 
bersome and tedious. Wirtinger's calculus provides an alternative 
equivalent formulation, that is based on simple rules and principles 
and which bear a great resemblance to the rules of the standard 
complex derivative. 

In the case of a simple non-holomorphic complex function 
T defined on U C C, Wirtinger's calculus considers two forms 
of derivatives, the 'R-derivative and the conjugate ^-derivative, 
which are defined as follows: 



dT_lfdu dv\ i_fdv__du 

dz 2 \dx dy J 2 \dx dy 

&T _ 1 fdu dv\ i_ (dv_ du 

dz* 2 \dx dy J 2 \dx dy 



where T(z) — T(x + iy) = T(x, y) = u(x, y) + iv(x, y). Note 
that any such non-holomorphic function can be written in the form 
T(z, z*), so that for fixed z* ', T is 2-holomorphic and for fixed 
z, T is 2 '-holomorphic 1 24 1 (assuming of course that T(x, y) has 
partial derivatives of any order). This fact underlies the develop- 
ment of Wirtinger's calculus. Having this in mind, 4|-, can be 
easily evaluated as the standard complex partial derivative taken 
with respect to z (thus treating 2* as a constant). Consequently, 
Jpr is evaluated as the standard complex partial derivative taken 
with respect to 2* (thus treating 2 as a constant). For example, if 
T(z,z") = z(z*f, then 



dT _ a &T_ 

Oz ~ (z > ' dz* 



222* 



Similar principles and rules hold for a function of many complex 
variables (i.e., U C C) (24). 

Wirtinger's calculus has been developed only for operators de- 
fined on finite dimensional spaces, C. Hence, this calculus cannot 
be used in RKH spaces, where the dimensionality of the function 
space can be infinite. To this end, Wirtinger's calculus needs to be 
generalized to a general Hilbert space, and this is one of the main 
contributions of the current paper. A rigorous presentation of this 



extension is out of the scope of the paper (due to lack of space). 
Nevertheless, we will present the main ideas and results. At the 
heart of the generalization lies the notion of the Frechet differen- 
tiability. Consider a Hilbert space H over the field F (typically R 
or C). The operator T : H — s> F is said to be Frechet differentiable 
at /o, if there exists a u G H, such that 



T(f + h)-T(f )-{u,h) H _ n 

\\h\\ ' ' 



\W\H 



\H 



(6) 



where (•, •) h is the dot product of the Hilbert space H and \\-\\h = 
\/{-, ■) h is the induced norm. The element u is usually called the 
gradient of T at /o. 

Since our study involves mainly RKHS, we will present the 
necessary tools in that context. The generalization to a general 
Hilbert space has also been developed and follows a similar path. 
Consider the spaces EI and H 2 defined in section|4] Let T : E — > 
C, T — Ti + iT'2 be the operator we seek to differentiate. As- 
sume that T = (Ti,T 2 ) T , T(f) = T(/i + if 2 ) = T(/i,/ 2 ) = 
Ti(/i, f'2) + iT'2(fi, f'2), is differentiable as an operator defined 
on H 2 and let V1T1, V2T1, ViT 2 and V2T2 be the partial deriva- 
tives, with respect to the first (/1) and the second (/ 2 ) variable re- 
spectively. It turns out, proof is omitted due to lack of space, that 
if T(/i, / 2 ) has derivatives of any order, then it can be written in 
the form T(f, /*), where f* — fx — if 2 , so that for fixed /*, 
T is /-holomorphic and for fixed /, T is /'-holomorphic. We 
may define the R-derivative and the conjugate R-derivative of T 
as follows: 

V/T = i (V1T1 + V2T2) + % - (V1T2 - V2T1) (7) 
V/*T = - (V1T1 - V2T2) + % - (V1T2 + V2T1) . (8) 

The following properties can be proved (among others): 

1. if T is /-holomorphic (i.e., it has a Taylor series expansion 
with respect to /), then V/*T = 0. 

2. if T is /'-holomorphic (i.e., it has a Taylor series expan- 
sion with respect to /*), then V/T = 0. 

3. (V / T)* = V / .T*. 

4. (V/.T)* = V/T*. 

5. If T is real valued, then (V/T)* = V/«T. 

6. The first order Taylor expansion around / S H is given by 

T(/ + h) =T(f) + (h, (V/T(/))> 
+ <fc*,(V/*T(/))>. 

7. If T(/) = (/, w) H , then V/T = w*, V/*T = 0. 

8. If T(/) = {w, /) H , then V/T = 0, V/*T = w. 

9. IfT(f) = (/*, w) B , then V/T = 0, V r T = to*. 

10. If T(/) = (w, /*) H , then V/T = w, V/*T = 0. 

11. If R, S : H -> C are /-analytic and T = R ■ S then: 

V/T = \7 f R- S + \7 f S- R. 

An important consequence of the above properties is that if T 
is a real valued operator defined on EL then its first order Taylor's 
expansion is given by: 

T(/ + h) = T(/) + {h, (V/T(/))*) H + (h\ (V/.T(/))*) H 
= T(/) + {h, V/.T(/)) H + ({h, V/.T(/)) H )* 
= T(/)+2.»[<fc,V/.T(/)) H ]. 



However, in view of the Cauchy Riemann inequality we have: 

»[<fc,V/.T(/))H]<|(fc,V/.T(/)) H | 

<||fc||H-||V/.T(/)||„. 

The equality in the above relationship holds if h oc V/* T. Hence, 
the direction of increase ofT is V/»T(/). Therefore, any gra- 
dient descent based algorithm minimizing T(f) is based on the 
update scheme: 



/n = /»-l-M'V / .T(/ n _ 1 ). 

6. COMPLEX KERNEL LMS 



(9) 



Consider the sequence of examples (z(l), d(l)), (z(2), d(2)), . . . , 
{z{N),d(N)), where d{n) G C, z(n) G V C C, z(ra) = 
a;(n) + iy(n), x(n),y(n) G R", forn = 1, . . . , AT. We map the 
points z(n) to the RKHS H using the mapping <!?: 

*(z(n)) = $(z(n)) + i$(js(n)) 

= « ((a;(n),;y(n)) T ,-j +»• « ((aj(n), y{n)) T , -J , 

f or n = 1, . . . , JV. The objective of the complex Kernel LMS is to 
minimize i? [£„(io)], where 

C n (w) = \e(n)\ 2 = |d(n) - {$(z(n)),wU 2 

= (d(n) - (*(*(»)), to) H ) (d(n) - <*(>(n)),w)ii)* 
= (d(n) - (w*,*(z(n)))m) (d(n)* - (to, *(*(»))>h) , 

at each instance n. We then apply the complex LMS to the trans- 
formed data, using the rules of Wirtinger's calculus to compute 
the gradient X7 w *tC n (w) — — e(n)* ■ <&(z(n)). Therefore the 
CKLMS update rule becomes: 



w(n) — w(n — 1) + fj,e(n)* ■ §>(z(n)), 



(10) 



where w(n) denotes the estimate at iteration n. 

Assuming that w(0) — 0, the repeated application of the 
weight-update equation gives: 

w(n) —w(n — 1) + (J,e(n)* §>(z(n)) 

=w(n - 2) + fie(n - l)**(z(n - 1)) 
+ fie(n)*$(z(n)) 



fc=i 
Thus, the filter output at iteration n becomes: 

d(n) =(&(z(n)),w(n- 1)) H 



(11) 



=^e(*)(*(*(n)),#(z(fc)))H 

n-l 

=2/t^e(*)/s(*(n),*(*)) 

fc=i 

n— 1 

=2/a^)H[e(n)]K(*(n),*(fc)) 

fc=i 

n-l 

+ 2^-i^S[e(n)]«(*(n),z(A;)), (12) 



where the evaluation of the kernel is done by replacing the complex 
vectors z(n), of C with the corresponding real vectors of R 2 ", 
i.e., 

«(n) = x(n) + iy{n) = (x(n), y{n)) T . 

It can readily be shown that, since the CKLMS is the complex 
LMS in RKHS, the important properties of the LMS (convergence 
in the mean, misadjustment, e.t.c.) carry over to CKLMS. Fur- 
thermore, we may also define a normalized version, which we call 
Normalized Complex Kernel LMS (NCKLMS). The weight-update 
of the NCKLMS is given by: 



w(ri) =w(n — 1) + 



/' 



-e(n)*&(z(n)) 



2 ■ K,(z(n), z(n)) 
The NCKLMS algorithm is summarized in AlgorithmQ] 

Algorithm 1 Normalized Complex Kernel LMS 

INPUT: (z(l),d(l)),...,(z(N),d(N)) 

OUTPUT: The expansion 

w = EfcLi «(&)«(*(*)>•) + » ■ EfeLi b{k)n(z(k), ■). 

Initialization: Set a — {}, b — {}, Z = {} (i.e., 
w = 0). Select the step parameter [i and the kernel 
K. 

for n=l:N do 

Compute the filter output: 

n-l 

d{n) = ^2(a(k) + b(k)) ■ n{z{n), z{k)) 
fe=i 

n-l 

+ £>(fc)-&(fe)) •«(*(!»), *(*))- 



Compute the error: e(n) = d(n) — d(n). 

7 = 2K(z(n),z(n)). 

a{n) = /u(S[e(n)] + 9[e(n)])/ 7 . 

6(n) = /*(»[e(n)] - 9f[e(n)])/7. 

Add the new center z(n) to the list of centers, i.e., add z(n) 
to the list Z, add a(n) to the list a, add 6(n) to the list b. 
end for 



6.1. Sparsification 

The main drawback of kernel based adaptive filtering algorithms is 
that they require a growing network of training centers z n . They 
start from an empty set (usually called the dictionary) and gradu- 
ally add new samples to that set, to form a summation similar to the 
one shown in equation jilt . This results to an increasing memory 
and computational requirements, as time evolves. Several strate- 
gies have been proposed to cope with this problem and to produce 
sparse solutions. In this paper, we employ the well known nov- 
elty criterion 1251 1131 . In novelty criterion online sparsification, 
whenever a new data pair (4>(z„), d n ) is considered, a decision is 
immediately made of whether to add the new center §>(z„) to the 
dictionary of centers C . The decision is reached following two sim- 
ple rules. First, the distance of the new center $(«„) from the cur- 
rent dictionary is evaluated: dis = min Cfce c{||^(2n) — c*||h}. If 
this distance is smaller than a given threshold 5± (i.e., the new cen- 
ter is close to the existing dictionary), then the center is not added 
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Fig. 1. Learning curves for KCLMS, (p = 1/2) CLMS (p = 
1/16) and WL-CLMS (jj, = 1/16) (filter length L = 5, delay 
D = 2) in the nonlinear channel equalization, for the circular input 



Fig. 2. Learning curves for KNCLMS Qz = 1/2), NCLMS (p = 
1/16) and WL-NCLMS (p = 1/16) (filter length L = 5, delay 
D = 2) in the nonlinear channel equalization, for the non-circular 
input case (p = 0.1). 



to C. Otherwise, we compute the prediction error e„ — d„ — d n . If 
\e n \ is smaller than a predefined threshold 82, then the new center 
is discarded. Only if je n j > 82 the new center 4>(z„) is added to 
the dictionary. 

Besides the previous scenario, other scenarios are also possi- 
ble, that keep the number updated parameters, per recursion, fixed. 
For example, the sliding window LMS can be used. In |4]|5]|6j, 
regularization, in the form of projections, has been used to cope ef- 
ficiently with the problem. Results under such scenarios are avail- 
able and will be presented elsewhere. 

7. EXPERIMENTS 

We tested the CKLMS on a nonlinear channel equalization prob- 
lem (see figure[3}. The nonlinear channel consists of a linear filter: 

t{n) = (-0.9 + O.Si) ■ s(n) + (0.6 - 0.7i) ■ s(n - 1) 

and a memoryless nonlinearity 

q(n) = t{n) + (0.1 + 0.15s) ■ t 2 {n) 
+ (0.06 + 0.05i)-t 3 (n). 

At the receiver end of the channel, the signal is corrupted by white 
Gaussian noise and then observed as r{n). The input signal that 
was fed to the channel had the form 



s(n) = 0.70{^l- p 2 X(n) +ipY(n)), 



(13) 



where X(n) and Y(n) are gaussian random variables. This input 
is circular for p = v2/2 and highly non-circular if p approaches 
or 1 1171 . The aim of channel equalization is to construct an 
inverse filter which taking the output r(n), reproduces the original 
input signal with as low an error rate as possible. To this end we 
apply the NCKLMS algorithm to the set of samples 

((r(n + D),r(n + D - 1), . . . ,r(n + D - L)) T , s(n)) , 



where L > is the filter length and D the equalization time delay. 
Experiments were conducted on a set of 5000 samples of the 
input signal dl3t considering both the circular and the non-circular 
case. The results are compared with the NCLMS and the WL- 
NCLMS algorithms. In all algorithms the step update parameter fi 
is tuned for best possible results. Time delay D was also set for op- 
timally. Figures[TJand[2]show the learning curves of the NCKLMS 
using the Gaussian kernel k(x, y) = exp( — \\x — y\\ 2 /a 2 ) (with 
a = 5), compared with the NCLMS and the WL-NCLMS algo- 
rithms. Novelty criterion was applied to the NCKLMS for sparsifi- 
cation with 8\ = 0.15 and 81 — 0.2. In both examples, NCKLMS 
considerably outperforms both the NCLMS and the WL-NCLMS 
algorithms. However, this enhanced behavior comes at a price in 
computational complexity, since the NCKLMS requires the eval- 
uation of the kernel function on a growing number of training ex- 
amples. 

8. CONCLUSIONS 

A new framework for kernel adaptive filtering for complex sig- 
nal processing was developed. The proposed methodology em- 
ploys a technique called complexification of RKHSs to construct 
complex RKHSs from real ones, providing the advantage of work- 
ing with some popular real kernels in the complex domain. It has 
to be pointed out, that our method is a general one and can be 
used on any type of complex kernels that have or can been devel- 
oped. To the best of our knowledge, this is the first time that a 
methodology for complex adaptive processing in RKHSs is pro- 
posed. Wirtinger's calculus has been extended to cope with the 
problem of differentiation in the involved (infinite) dimensional 
Hilbert spaces. The derived rules and properties of the extended 
Wirtinger's calculus on complex RKHS turn out to be similar in 
structure to the special case of finite dimensional complex spaces. 
The proposed framework was applied on the complex LMS and the 
new complex Kernel LMS algorithm was developed. Experiments, 
which were performed on the equalization problem of a nonlinear 
channel for both circular and non-circular input data, showed a sig- 




Fig. 3. The equalization problem. 



nificant decrease in the steady state mean square error, compared 
with complex LMS and widely linear complex LMS. 
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